Clustering analysis of vegetation data

نویسندگان

Valentin Gjorgjioski

Sašo Dzeroski

Matt White

چکیده

Vegetation may be described as the plant life of a region. The study of patterns and processes in vegetation at various scales of space and time is useful in understanding landscapes, ecological processes, environmental history and predicting ecosystem attributes such as productivity. Generalized vegetation descriptions, maps and other graphical representations of vegetation types have become fundamental to land use planning and management. They are widely used as biodiversity surrogates in conservation assessments and can provide a useful summary of many non-vegetation landscape elements such as animal habitats, agricultural suitability and the location and abundance of timber and other forest resources. We use clustering or classification of vegetation data to obtain such descriptions, maps and other representations. Clustering vegetation data is well known machine learning problem which aims to partition the data set into subsets, so that the data in each subset share some common trait. Summary of vegetation classification and methods can be found in the numerous texts that focus on this discipline[6,3]. In our work we deal with vegetation data which is organized in relational model. To be able to apply classical machine learning approach we need to do some data preprocessing. We preprocess the data using simple aggregation techniques and we use several approaches to analyze the data: Predictive clustering trees [1], k-Means and Hierarchical Agglomerative Clustering. These algorithms were applied and satisfactory results were obtained. The rest of paper is organized as follows. First we discuss dataset and problem in details. Further on we show preprocessing details needed to make data suitable for classical data mining approaches, and in the next section we are describing our data mining setup and experiments. Next, we present the results of the experiments and at the end we conclude with discussion and further work proposals.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Climate Classifications: the Value of Unsupervised Clustering

Classifying the land surface according to different climate zones is often a prerequisite for global diagnostic or predictive modelling studies. Classical classifications such as the prominent Köppen–Geiger (KG) approach rely on heuristic decision rules. Although these heuristics may transport some process understanding, such a discretization may appear “arbitrary” from a data oriented perspect...

متن کامل

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

An Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem

Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...

متن کامل

Modern pollen analysis: a reliable tool for discriminating Quercus rotundifolia communities in Central Spain

The paucity of modern pollen-rain data from the Iberian Peninsula is a signifi cant barrier to understanding the Late Quaternary vegetation history of this globally important southwestern mediterranean region. The relationships between current vegetation, the available environmental data and modern pollen are examined in Central Spain for both natural and human-induced vegetation types, as an a...

متن کامل